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Abstract 

The Oelinski-Moranda (JM) model for software reliability is 
examined. Ue suggest that a major reason for the poor results 
given by this model is the poor periormance of the maximum likeli- 
hood method (ML) of parameter estimation. A reparameterisation and 
Bayesian analysis, involving a slight modelling change, are proposed. 
It is shown that this new Bayesian-Jelinskl-Moranda model (BJM) 
is mathematically quite tractable, and several metrics of interest 
to practitioners are obtained, A comparison of the BOM and JM 
models was carried out using several sets of real software failure 
data collected by Musa. In all cases the BJM model gave superior 
reliability predictions. 

We discuss ways in which the assumptions underlying both 
models can be changed in order to represent the debugging process 
more accurately. 
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1. Introduction 

The first software reliability growth models appeared more 
than ten years ago [1*2], but they seem not to have gained 
acceptance by practitioners. The reasons for this disappointing 
performance have not been widely reported in the literature (perhaps 
as a result of the unfortunate tendency of scientific journals to 
concentrate on "positive" results). It seems clear, though, that 
the need for software reliability measurement techniques is not 
disputed, so perhaps we should look to the poor performance of 
the models to explain the lack of acceptance. 

In this paper we shall examine the Jel inski -Moranda (JM) 
model [1], possibly the earliest and certainly one of the best- 
known models. Although our remarks will be addressed to the OM 
model, it should be borne in mind that other models are similar to, 
or dependent upon, the JM model. Shooman's work, for example 
[2,3], seems to have paralleled that of Jelinski and Moranda. 

The model due to Musa [4] used the JM model as a basis, but 
introduced many important refinements. These refinements make this 
model particularly attractive to users, but its validity must 
ultimately rest upon the validity of its JM foundation. Goel and 
Okumoto [5] also generalise the JM model. Goel [6] casts the 
JM model assumptions into a different probabilistic structure. 

We believe that our remarks about the JM model also concern this 
work. 


The JH model often gives misleading answers when the method of 


maximum likelihood (ML) is used to estimate the parameters. We 
present a Bayesian modification to the model which overcomes a 
major source of difficulty. In our conclusion we suggest how the 
model might be further improved by changes to one of the basic 
underlying assumptions; we hope to report on this new model in a 
future paper. 
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2. The OM model 
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2.1 Model assumptions 

The JM model. In common with most early models treats 

the program as a black box with special characteristics supposedly 

representative of the special properties of software. Ni account 

is taken of the internal structure of the program. The on'y 

input to the model is the sequence of execution times between 

successive failures (see [4] for a cogent argument in favour of 

execution time): t , t , ... . The objective is to estimate 

1 2 

current and future reliability on the basis of these past inter - 
failure times. The problem, then, is one of estimating and 
predicting reliability growth . 

Assumptions made in the JM model are: 

1. The random variables T^(i = 1,2,.,.), representing successive 
interfailuro execution times, are independent, with exponential 
distributions: 

pdf(t.|X.) = X.e'SS- (1) 

> 0, t. > 0) 


2. At each failure, a fault is fixed instantaneously, with the 
result that the failure rate improves. All such improvements are 
of equal size so that 

X. = (N-i+1),. (2) 

where N . initial number of faults in program 

change (improvement) in failure 
rate at each fix. 

See r iguro 1 , 
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Readers familiar with hardware reliability growth literature 
will notice that in these assumptions the repair rule is spelled 
out precisely, in contrast to models based on Duane's empirical 
postulate [8]. There, continuous reliability growth is allowed 
via a non- homogeneous Poisson process. 

A detailed analysis of these assumptions has been given by 
one of us elsewhere [9], Briefly, assumption #1 seems a 
plausible way of modelling our uncertainty about the nature o’’ ^•he 
input stream which the program must process. Assumption #2, 
representing the effect of successive fixes, appears less plausibl 
A stochastic process would seem to be a better way of representing 
the sequence {x/j than the deterministic sequence, (2). After 
all, even in those circumstances where we can guarantee to have 
carried out a successful fix, we shall be uncertain as to its 
effect on the failure rate of the program (have we eliminated a 
large fault or a small one?). This is a theme we shall return 
to later in the paper. 
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2.2 Difficulties associated with using the model 

There seem to be three main areas of difficulty. They 
concern the properties of the maximum likelihood estimates of 
the parameters, and the quality of reliability predictions. 

A 

1. N, the ML estimator of N, is occasionally infinite. 

Since N and $ are obtained by a numerical optimisation of the 
likelihood function, a user can easily interpret this effect as 
non-convergence of the optimisation routine [10]. Littlewood 
and Vernal 1 Dll, however, show that in certain circumstances 
the unique true maximum of the likelihood function will be at 
N = », $ = 0 (with finite, non-zero X = N^). A necessary and 
sufficient condition for this is shown to be that the least 
squares regression line of t^ versus i has non-positive 
slope. The condition is intuitively appealing: it suggests 

that the JM model, being a reliability growth model, will give 
nonsensical answers unless the data exhibits reliability growth. 

It needs to be said, though, that even when we simulate data 
from the JM model (finite N, non-zero <j>) there is a non-zero 
probability that a particular data set will show no growth 
according to this condition. 

In real data sets, this problem does not often arise except 
at early stages in debugging, i.e. when the sample size is small. 
This is presumably because most data sets come from programs 
which are genuinely improving in reliability. We have, however, 
encountered one set of real-life data. System 5 in Musa's collection 
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[j2], where the effect persists for the first 150 failures. 

In order to handle situations of this kind, it would be necessary 
to use a more general roodei which could estimate reliability 
growth or reliability decay, for exai.'p1e that described by 
l.ittlewood and Verrall (LV) [7,22]. 

A 

2. A more serious problem is that often N is only slightly 
larger than n, the sample size (number of failures experienced, 
number of faults fixed). Thus, estimates of N based on 
increasing amounts of information usually increase with n. 

This raises doubts about the consistency of the ML estimators, 
but it is questionable whether such a concept has any meaning in 
this context: the size of the "sample", n, is bounded above by 
a parameter, N. Forman and Sinpurwalla p4] have shown that 
N and (> can only be trusted near the end of debugging, i.e. 
when almost all faults have been removed and the true value of 
N is only slightly larger than n. This observation, however, 
is of little practical value since we wou ld nev er know the end 
of debugging was near. It is certainly not the case that N 
takes values close to n only near the end of debugging. 

At its most serious, this effect results in N = n 
exactly for a range of values of n. Thus, the ML estimator 
suggests that the last fault has been removed and the program is 
pei’foct even when this is far from being the case. Table 1 
shows this in an analysis of Musa's System 3 data [,12]. From 
failure number 25 onwards, successive estimates of N tell us 


i 

j 


j? WWVVw <..tf 

^ ^ — 
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that the p>^ogram is perfect; and each time the program reveals 
its imperfections by failing. A similar effect occurs In 
Musa's System 40, where N = n between n - 76 and n = 99. 

In these cases, it is obvious that the model cannot give good 
rel iabil ity predictions: it will estimate the reliability to be 
1, the mttf to be infinite, the failure rate zero etc. 

3. This bring us to the last, and perhaps roost important, problem. 
In almost every data set we have analysed, the model has produced 
results which are too optimistic: it seems always to predict the 
reliability to be greater than it really is. Clearly, this 
effect will not be independent of the effect described in the 
previous paragraph: if ML gives poor estimates of the parameters, 
it seems likely that the resulting estimates of rel iabil ity 
netrics will be poor. On the other hand, if the modelling assump- 
tions are wrong, we shall obtain poor reliability prediction 
however we make inference about the model parameters. 

Our intention in what follows is to improve upon the results 
which can be obtained by usirg ML on the JM model. Our 
Dayesian approach to inference necessitates a slight change lO 
the model itself, but we believe this to be sufficiently minor 
as to justify calling it a Bayesian Jel inski-Moranda model. 


‘We.- . 
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3. The Bayesian Jel inski Moranda (BJM) Model 

We begin by reparameterisation to (X,(^) where 

X = N0 (3) 

the initial failure rate of the program. A formal motivation 
for this parameterisation can be found in 01] and has already 
been mentioned* even when the likelihood has its maximum at 
infinity in N, X is finite and ncn-rem. An informal 
justification comes from inspection of Figure 1. Our dati will 
always concern the earlier stages of debugging, and the 
statistical problem is one of fitting x^ as a linear function of 
i (failure number). Since our data will concern the left of 
this plot, it seems plausible that we can obtain jood estimates 
of the intercept on the vertical axis: namely, X. Estimation 
of N, however, implies estimation of the intercept on the 
horizontal axis. Such estimation will involve large errors as 
a result of quite small errors in estimation of ({>, the slope 
of the line. Notice that this reasoning explains the observation 
of Forman and Singpurwalla [14] that estimates of N can be 
trusted only near the end of debugging. Although the argument 
above is a plausible reason for the poor quality of ML estimates 
of N, it does not explain why the estimates tend to be too small . 
We shall discuss this point later. 

We shall adopt a Bayesian approach to the inference, with 
independent Gamma priors fur X and ij.. Since X is no longer 
constrained to be an integer multiple of 4 >, this involves a 
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slight change to the model. Instead of assumptions 1 and 2 of 
the JM model, the BJM model assumes: 


The successive in',er- failure execution times, Tj, Tj, Tj, ... 
are independent random variables, exponentially distribited with 
parameters X, X«4i, X-2t, ... . 

The main effect of this is that the repair rule changes at 
the last failure. Each fix except the last removes an amount <)) 
from the failure rate of the program. When the failure rate is 
less than or equal to (j>, the next fix makes the program 
"perfect" (zero failure rate). It seems likely that, except for 
programs with a very small number of faults, the 3JM and JM 
models will be very similar. 

We now let prior pdf of (>■,(^) be 


prior H prior (X) prior :<p) (3) 

where prior (X) is Gamma (X;b,c) and prior (i*>) is Gamma ('J';f.g). 


1 .e. 


b,b-i -cA 

prior (X) = (X > 0) 

r(b) 


(4) 


and 


„f^f-i -gi*) 
prior ((!)) = — — 

r(f) 


(^ > 0 ) 


( 5 ) 


The hyperparameters b, c, f and g are to be chosen by 
tlie user ac:ording to his prior knowledge and subject to the 
c nstrairits that ail are positive and b is an integer. This 


last condition is for mathematical tractability alone, but is 
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probably not unduly constraining. Elicitation of prior 
knowledge in order to give values to the hyperpararoeters is not 
easy. la our own work we have used “ignorance pri''rs“ for X 
and 4>: non-inf ormative (improper) uniform distributions obtained 
by letting c, g 0 and f = b = 1. We shall proceed in the main 
body of the text to adopt this simplification. The more general 
results using the full gamma priors are relegated to the 
appendix. 

We snail assume, then, that 

prior (X,(|)) =1, (X, 0 > 0) ( 6 ) 

and that we have observed tj, tj, .... t^. We have 

posterior (X,^) = p(X, 0 |t^, ... t^^) 

— C.p(tj, ...» tj^|x, 0 ) prior (X,0), (7) 

by Bayes theorem, where 

C* = ||p(tj, ... t^|X, 0 ) prior (X, 0 )dXd 0 ( 8 ) 

and the likelihood ^^unction is 

p(t^, t^|X, 0 ) = X(X- 0 ) ...(X-|n-l] 0 ) 

exp {-Atj - (X- 0 )t^- ... -(X-[n-l] 0 )t^} 
if X > (n-l )0 (9) 

= 0 otherwise. 

If we define ^ ^ 

n (x+i) = ( 10 ) 

i=l i =0 

a little analysis shows that the posterior distribution is 
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p(X, (j)|tj, .... t^) = 

n-1 . . . -XEt. 

I a. . , (-l)\'x'' ' e ‘ J e ‘ ^ 

1=0 ' 


n-1 

} ®i,n-l - 


i; (n-il' 


1=0 


r" "li+if” 1 

i?*jj 


n -in-i+l 


(11) 


for X > (n-l)(|) and zero otherwise. 


This expression is much more tractable than might appear at 
first glance. In the next section we shall obtain analytic 
expressions for many of the reliability metrics which are of 
practical interest. It is surprising, in fact, that the 
computational difficulties associated with the BJM model are 
considerably less than those associated with the numerical 
optimisations required by ML estimation in the JM model. 

The coefficients defined in (10) are ciosely related to 
Stirling numbers of the first kind [^IS^ , and are most 
easily computed from the relation 

^i,n " ®i-l,n-l 
noting that i " ^ i " 




(12) 


a = 1 V n. 
o,n 
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4. Using the BJM model 

In this section we show how the BJM model can be used to 
calculate metrics of practical value. We begin with measures 
of current reliability which will be used in the next section 
to compare the performance of this model with the JM model. 


4.1 Current reliability 

Having observed n failures, and carried out n fixes, 
the simplest question we can ask is: how reliable is the program 
now ? The various ways in which this question can be answered 
all involve statements about the random variable the 

time of failure-free execution until the next failure of the 
program. Consider the reliability function 

Rn+l(t|X,4.) = > tlx ,$) 

= if X > n4> 

1 if (n-l)(ji < X < n4» 

remembering that is not observed if X < (n-l)<j). 

In our comparison between JM and BJM we shall use the 
posterior mean of this: 


(13) 

(14) 


(15) 


R„^l(t|X,.t.)p(X,(j)|tj,...,tn)dXd.J) 

(16) 


This can bo interpreted as the reliability function calculated 
from the posterior distribution of T^^^ . 
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Substituting for (14) and (Vi) into (16) and 
simplifying considerably, we obtain 

pn a. _(n-i): il 

l^n+1 ‘ * *^n^ ~ ^ ~ • 

(UEt.)''-^^^.(n-j.l)t.)^^^ 


i=o (?:tp"”‘^‘(j:(n-j)tj)’^' 

-I ^.n ^ = = 

i=° (Etj)"-^"'(E(n-j+l)tj)' 


where C"' is 


n-i a. i:(n-i): 

I -21.IL-1 (18) 

i^o (rtj)"-^"’(E(n-j)t.)’"’ 

By differentiating (17) it can be shown that the posterior 

distribution of T is a mixture of Pareto distributions. 
n+1 

The posterior probability that the program is now "perfect", 
i.e. that the last error has been removed, is 

''0 - = P((n-l)s" ^ X c ,H<-|t^,...,tJ 

i^o (vt.)"’’^’(5;(n-j+l)tp’^' 

"v‘ .. 

We shall see, in the case of Musa’s System 3 and System 40 data (see 
cominorits), that this expression gives much ,iX)re plausible 
x\?iswors than the JM modiH. 









c 




C 




- 17 - 


original page w 
OF POOR QUALITY 


Sctie care has to be taken in calculating these expressions 

(17-19) because the coefficients a. can be extremely large. 

I *n 

However, very little machine time is required. The authors have 
a Fortran program which is available to readers on request. 


An alternative to using the posterior mean of the conditional 
reliability function, (13), is to set ourselves a reliability 
target and then ask how strong is our posterior belief that this 
target has been achieved. If we let the target reliability 
be a pair of numbers (t,r) such that 


P(T > t) > r , 


( 20 ) 


it can be seen that our posterior probability that this has now 
been achieved is 


= P{Rn+i(t|X,((») = 1 or r < R^^j(tlX,({)) < lltj,...,t|^} 

= Po + 1 - P{Rn+i(tlA.$) < r|t^,...,y (21) 


where 


P{Rj^^j(t|X,(j)) < r|tj,...,t^} 

= P{.\-n(|)>- |tj,...t^} 


= d 


i=o (^(n-j+l)tj) 


1+1 


n -i 

x^ ^e ^ dx 


x=- logr 

t 


n 

C I 

i=o 


a,,i:(n-1): 


Etj log r 


(;.(n-j+l)tj) 


U.f 


t I itji. 

"-'e 


. log r 


n-i+i 1 


k=o 


kl 


( 22 ) 
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4.2 Current failure rate 


The current failure rate of the program, when the nth fault 
has been fixed, is given by 

A = X - n<}) if X-n(ji>0 (23) 

0 otherwise . 

We can find the posterior pdf of A conditional on the 
program not being perfect as follows: 

p(A I A > 0, tj, t^.. t^) 


P(A, A>0|tj ,••• t|.|) 
P(A>0|tj t^) 


(24) 


The denor:inator has already been evaluated, (19). We can find 
the numerator by transforming (11) from (x, (|>) to (A, t) and 
integrating out ii. Then (24) becomes 
n 

I 

1=0 


B a. 


1 ; 


[)L(n-j+l)tj] 


1+1 


,n-i -A zt. , , 

A e j (A>0) 


(25) 


wnere B is a normalising constant. This is a mixture of 
Gamma (A;n-i+l, i;t.) densities. If our reliability target has 

J 

been formulated in terms of a target failure rate, i say, (19), 
(23), (25) can be used to obtain the probability that the target 
has been achieved, P(A<t). This calculation involves 
evaluation of incomplete gamma integrals for which tables are 
avai 1 abl e, 

A simpler procedure is to calculate the posterior expected 
value of A. This can be interpreted as the failure rate obtained 


19 - 
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at the origin of the posterior distribution of 


H lim 
6 0 


(26) 


Bt ’*** 


t=o 


This is the "current posterior failure rate", and is given by 
E(A|tj ...y 

= E(AjA > 0. tj t^)P(A>0|tj tj,) (27) 


where 


P(A>0|t^ t^) 


“i.n *• 


I - 

^.n-l 


(28) 


from 


(19) . and E(A[ A > 0, . t^^) 


n 

I 

1=0 


®i,n n-i+i 


lO-i+i ,i+i 


(Et.)" (r.(n-jH)t.) 


n 

i 

i=o 


^i.n 




( 29 ) 
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Since there is always a non-zero probability that the last fi 
removed the last error, the distributions for Vi* Vz*- 
conditional on sre improper and their expectations do 

not exist. It is, however, possible to find the posterior 
distribution of conditional on < » (it is a mixture 

of Paretos). These distributions, together with the probabilities 
^^^n+k useful, and we consider them next. 


- 21 - 


ORIGINAL PAGE IS 
OF POOR QUALITY 


C 




c 


4.4 Future reliability, number of faults remaining 

Let the random variable denote the number of faults 
remaining inroediately after the removal of the nth fault 

^^^n * ^1^1 ** * * ^n^ 

= P(Kn>k|tj ,..t„)-P(K„>k+l|t^ ,..t„) 

- P(x > (n+k“l )(^|t^ >••• tj^) 

- PU > (n+k)^|tj t^) 


(30) 


Now 


P(K„ik+lltj ,..tj 

OO 00 

-1 1 

4>=o x=(n+k)<^ 


p(x,«lt^ .... tj^)dAd^ 


n 

= C I 


bin ("-i)' i' 


i=o (J:tj)'’”^''’^(i:(n+k-j+i)tj)^'*’^ 


.-1 


(31) 


after some analysis. Here C is given by (11) and the 
u 

coefficients , related to the a^^ coefficients, are given 
by 


I b*! aS""’ = n (x + (k+j)(t) 
i=o j=i 


(32) 


It can be shown that 

b^ = i a (r^)k^ 
m=o 

Finally, substituting into (30), we get 
P(K„ ■ k|t, 


-m 


( 33 ) 


i 

V 
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= c I JUnzllL I 

i=o mi 


a_„(n-n): 
fnn' ' 


ikdl 


1-m 


m=o 


(E(n+k-j)tj) 


1+1 


for k = 1 ,2,. . . and 


k^-"» 

{zCn+k-j+Otj)’"^^ 


( 34 ) 


^^^n ” ® I » • • • 


= 1 - C ? 


( 35 ) 


which agrees with (19). 

Expressions such as these can give us useful upper bounds on 
the number of failures which can occur during the lifetime of the 
program, assuming a fault-fixing strategy is adopted. They thus 
give upper bounds on lifetime maintainance costs. However, since 
the times between failures have distributions which are mixtures 
of Pareto distributions, the time needed to uncover the last fault 
may be very much larger than a realistic program lifetime. In 
such cases we shall obtain pessimistic estimates of maintainance 
costs by using (34), (35). 

Consider now the random variable T . We shall observe this 

n+k 

random variable only if i k. Then 


P(T 


n-i-k 


\ V 

n 


k|t, 


tn) 


O' CX'i 

- J I > tlx,;.)i)(X.^t, 

x==(n+k-i)i^ 


)d>d,‘ 
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n 

t I 

i=o 


(n-i)l i: 




i 

I 

m=i 




1-m 


(l-ni)J(n-i)* 


after some analysis. ' ' 

The above expression suggests how we might answer the 
important question : how much debugging is still needed before the 
program will have achieved its target reliability? Consider a 
target reliability expressed as a pair (t,r) such that 

P(T > t) > r (37) 


The target will be achieved in less than k failures if 

(38) 

that is 

> t and >, kjt^ . t^) + P(K^ < k|tj t^)>r 

(39) 


The procedure then, is to calculate the L.H.S. of (39) for 
k = l,2,..., until the first value of k for which the condition 
is satisfied. This is then an estimate of how many more fixes 
have to be carried out to achieve the reliability target. 
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5. Comparison of BJN and JH analyses of software reliability data . 


We shall concentrate in this section on analyses of some 
software reaiability data sets published by Musa [12]. It Is a 
source of amazement to us that there Is so little good quality 
software reliability data available in the open literature. There 
are two reasons for this. Often data Is collected in a manner 
which renders it unsuitable for modelling purposes. More 
commonly, when suitable data does exist, it Is guarded jealously 
by the producer organisation in the belief that Its publication 
would cause loss of confidence in their software products. VJe 
think this belief is mistaken : reputations are more likely to 
suffer from the suspicions which these secretive actions 
engender. 

Musa is to be congratulated on publishing seventeen sets of 
data which VKere collected under carefully controlled conditions. 
These data sets seem to be the only ones of reasonable quality 
which are reauily available, tven this data, representing the 
successive execution times between failures (tj ,t^ ,. . .t^^,.. .), 
occasionally gives rise to disquiet. Simple tests of trend show 
that only a few of the programs are exhibiting reliability growtii 
113). In what follows, we have concentrated on these (with tne 
exception of System 40, which is discussed later). There is some 
evidence that the successive times arc correlated. Of course, 
this dots not necessarily imply criticism of the data collection, 
but successions of small observations might suggest "poor fixes", 
be have not attei 4 icd to eliminate any of these effects in what 
tCiiows, so as not to be open to charges of niassaging the data. 
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Our procedure for examining model perforntance is discussed in 
detail in [16] and has also baen used by other workers in the 
reliability growth field [17, 18], Briefly, we shall compare 
model predictions with actual observations with the intention of 
asking the same question as might be asked by a potential user cf 
two rival models. In what follows we shall concentrate on the 
ability of the models to estimate current reliability; see [16] 
for a discussion of the problem of examining the quality of 
longer-term predictions of reliability. 

Assume that (i-i) failures have been observed, so our data 
set is t , t_ , . . . t . , , and we are interested in the current 
reliability. This is a statenient about the random variable T^. . 
Consider the preoictor cdf of T., say F^(.). For the JM model 
this is 

F'i(t) = F.(t ;N, $) 

. 1 . ( 40 ) 

Where N and are ML estimates of N,4> based on t^,.. t,j_j. 
For the BOM model we use 


F.(t) = 1-R^.(t|t^ (^^) 

which is obtainable from (17) and (18). All statements about the 
current time to failure random variable involve F.(.), so it 
seems plausible to base our examination of th" quality of the 
model upon this. If F^(.) were the "true" distribution of , 
then 




(« 2 ) 


would be uniformly distributed on (0,1). and be at least asymptotically 
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We shall consider their realisations 
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(t^) . (43) 

where t^ is the realisation of , i.e. it is the actual 
observed time between the (i-i)th and i th failures. These 
numbers thus ^orm the basis of a compa. ison of our predictions 
(based on tj,.. t^^j) and our actual t^. Our first tool will 
be the quantil .-quantile (Q-Q) plot: i.e. a plot of the ordered 
set of m u^'s against i/m. The closeness of this to the line 
of unit slope is an indication of the closeness of the u's to 
uniformity, and so an indication of the quality of prediction of 
the model. We shall refer to this as Procedure 1. 

Braun and Paine (171 suggest that the plot of u^ (not 
reordered) versus i should also be examined : it should look 
"patternless" if the model is performing well. Presumably the 
intention is to attempt to discover how well the model is 
capturing the trend. We have found these plots quite difficult 
to interpret, and have instead used the following informal 
procedure. If the given by (43) really were realisations 
of independent, identically distributed (iid) uniform random 
variables then 

= -log(l-u.) (44) 

would be realisations of iid unit exponential random variables. 
Thus a process with interevent times given by these x.'s would 
be a realisation of a simple Poisson process. It is well known 
llO) that if we take the time to the (mrl)th event in such a 
process to be unity, the times of occurrence of the m events 
are independently uniformly distributed over (0,1). In our case. 
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if the model i? j>ertoiTning well, this statement will be 

approximately true. Since the procedure is well known, in the 

exact case, to be sensitive to trend [19], we might expect to be 

able to detect when a model is not capturing the trend 

{reliability growth) adequately. We proceed b lotting the 

empirical cumulant distribution function of the numbers 

i m 

y, = I'-,/ lx, («) 

I J 1 J 

Again, gowd performance is indicated by closeness of this to the 
line of unit slope through the origin. We shall refer to this 
Procedure 2. 

It is perhaps worth stating explicitly than in neither of 
these two informal procedures is it our intention to carry out a 
goodness-of-fit test. On the contrary, in the context of this 
paper it is our contention that the JM and BJM model s are 
virtually identical : thus if one performs notably better than ^r.e 
othp>" the reason wiil presumably be that the inference procedures 
are perfomiing dif<"erently. Our two informal procedures are 
designed to emulate the behaviour of an actual user of a model, 
who is interested primarily in whether he can trust the model 
predictions. The general problem of examining the quality of 
model predictions, as opposed to testing goodness-of-fit of models. 

an interesting one which has received relatively little 
at ntion. 

Table 1 shows the data and calculations on both models for 
Musa's ’■'system 3, As has been stated earlier, the JM model 
perforiits badly by jiving N- n for sample sizes 25 through 38. 

Tre r. 'M model gives (u'ohab i 1 i t i es for such pertoction, , which 
alLiicugh appreciable, differ considerably f rom unity. Notice 
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that the range of sample sizes for which BOM gives 

significantly different from zero, sizes 25 through 38, is the 

same as produc-. N = n for JM. The Q-Q plots of the two 

models (Procedure 1) are shown in Plot 1, where it is obvious 

that BOM gives considerable improvement over JM. The poor 

behaviour of the JM model is almost entirely accounted for by 

the cluster of values at zero corresponding to i = 25,..38. It 

is surprising that the worst results from the model come from 

the end of the data series: it might be expected that with larger 

samples the estimation procedure would perform better. If this 

program is close to being bug-free, the results cast further doubt 

on the practical usefulness of the observation of Forman and 

Singpurwalla 114], that ML estimates of N can be trusted at the 

end of debugging. It is certainly not the case that a zero value 

for N-n gives high confidence that the program is now perfect. 

Plot 2 shows the result of applying Procedure 2 to this data 
with the two models. The cluster of zero observations at the end 
of testing for JM cause the poor performance, as might be expected. 
However, the plot is reasonably linear , albeit with wrong slope: 
this suggests that the trend is being captured fairly well in -he 
earlier stages of testing. Although the BJM model is considerably 
better, the concavity of the plot again suggests that the trend is 
not being captured completely. In fact the BJM model is also 
giving optimisti answers for n - 25 onwards, although not nearly 
so optimistic as JM. 

Table 2, Plots 3 and 4, show the results of analyses of 
System 40 |12l data. The JM model, using Ml. estimation, gives 
N i for n - 76 through 99. This accounts for the large 
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deviation at the origin of Plot 3, and the extreme departure 
from the line of unit slope in Plot 4. BJM gives a better 
Plot 3., but is still quite optimistic in its predictions, and 
the Plot 4 behaviour is little better than JM. A closer 
examination of the data set causes some disquiet, and may explain 
the poor results. Simple trend tests ([19], p.47) were carried 
out on the whole data set, and the first and last halves separately. 
These show significant growth overall, but not significant growth 
in either half. The lack of growth in the first half of the data 
is revealed in the many infinite estimates of N in the JM model, 
see [11]. There seems, therefore, to be evidence that this program 
exhibited a quite sudden, perhaps discrete, improvement in 
reliability half way through the collected data. This would explain 
the overall reliability growth, but the absence of growth in each 
half. Musa, tiowever, does not recall any conditions in testing 
which would have produced such an effect. Of course, it is 
unreasonable to expect any reliability growth model to perform well 
in a case like this : all models assume some homogeneity of growth 
behaviour. This data set exhibits some of the pitfalls we have to 
beware of when analysing software failure data. In many cases we 
shall know when a discrete change of behaviour has occurred (change 
in testing procedure, integration of more code): we cannot normally 
expect models to perform well over such a discontinuity of behaviour. 

Examination of the JM model Plot 4 reveals an extremely "jagged" 
behaviour even for the first 20 or so plotted points : evidence of 
larger variability in the x/s than would be suggested if the model 
were performing well. This is clear from the raw data, where there 
appear to be more very large and very small observations than would 
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be plausible under the exponential assumptions of JM (and BJM) . 
The BJM model handles this variability a little better than JM 
(see, for example, the distances between steps 2 and 3 in Plot 4 
for the two models, corresponding to the large 54th observation): 
perhaps because of the long-tailed nature of the mixed Pareto 
posterior time- to-next-fai lure distributions. Our suspicion is 
that the data is contaminated in some way: it may be, for 
example, that the small observations represent imperfect fixes 
and ought to be excluded. Unfortunately, any filtering of the 
data has to be carried out by the da ta-col lector at the time of 
collection, using criteria which are based on an analysis of the 
actual circumstances of the failures. It does not seem possible 
to base a data rejection procedure solely upon the data itself. 
Accordingly, our analyses were performed on the data as published, 
and we merely record our reservations. 

Tables 1 and 2 are revealing about the general 
untrustworthiness of estimates of N in JM. Advocates of the 
JM model have argued that knowledge of N, or more precisely 
N-n (the number of remaining faults) is of great practical 
interest and can be provided by use of this model. One of us has 
suggested elev;here [20] that reliability itself is the only metric 
of interest. Tables 1 and 2, which are fairly typical of the 
analyses we have seen of real data sets, show estimates of the 
number of remaining faults fluctuating wildly between infinity 
and zero. Our inability to obtain good estimates seems to us to 
render purely academic any discussion of their utility. If only 
the quality of re i i ahi 1 i ty prediction is the issue, then models 
wrriCs treat failure rate directly [/, 2.2] can be considered on an 
equal footing with ^ault-counting models. 
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Plots 5 and 6 relate to System 1 data [12], Although this 
data set does not produce zero or infinite estimates of the number 
of remaining faults, the estimates of N are consistently 
optimistic, being only slightly larger than the sample size. 
Reliability estimation (Plot 5) for JM is here better than in the 
two previous examples, although again optimistic. The BJM model 
is slightly better, but also gives optimistic results. 

Plot 6 is quite interesting. In the first place, there is 
very little difference between BJM and JM on this plot, suggesting 
that each model captures the trend with similar accuracy. Since 
BJM is slightly better on Plot 5, this might suggest that the shape 
of the distributions of time to failure is represented better by the 
mixed Paretos than by exponentials. Of more interest, though, is 
the shape of Plot 6. Until about observation 90, the plot is 
reasonably linear (if we ignore the first four extremely small 
observations on this plot). Both models seem to be performing well 
between sample sizes 34 and 90, and only start to give very 
optimistic reliability predictions from 90 onwards. This may again 
suggest that some discrete change of behaviour has occurred. Musa, 
who collected this data set personally, is not aware of any such 
change, so the apparent effect may be spurious. 

It is possible that, in cases such as these, better results 
would be obtained by not using all the data for the later 
predictions. We might choose to base eacn calculation only on the 
last 50 observations, say, in order to rake the model fairly 
responsive to discrete changes in behaviour. It would be very 
difficult to justify a particular choice of "lag", though, and our 
own feeling is that greater care should be taken to ensure 
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homogeneity of behaviour during data collection. 

Plots 7 and 8 concern Musa's System 2 data [12]. These 
plots, like the previous ones, are typical of the results we have 
found on other data sets: BJM is noticeably better than JM, but 
still gives optimistic answers. In all the data sets we have 
analysed, BJM is better than OM. However, the degree of 
improvement obtained by using BJM varies considerably; it is 
greatest when JM gives N = n for a substantial range of n. 

In all cases the JM model errs on the side of optimism, as does 
the BJM model but less markedly. We shall discuss this issue in 
more detail in the next section. 
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6. Conclusions and discussion of possible future work . 

In our work so far, we have not found any data sets for which 
JM performs better than BJM, and In several cases BOM is much 
better. We therefore suggest that any practitioner who is 
tempted to use the OM model should instead use the BJM model. 

Since all important metrics for the BJM model are available in 
closed form, use of this model brings an important bonus of 
computational simplicity. 

The BJM model has certain conceptual advantages over the JM 
model. Perhaps most important, it allows calculation of the 
probabil ity that the program is currently fault-free. It also 
gives estimates in closed form of the remaining number of fixes 
to be carried out to achieve target reliability, as well as the 
number of faults remaining in the program. These metrics could be 
of great value in estimating the extra development effort needed, 
as well as providing information about maintenance costs. 

We believe, then, that there are considerable potential 
advantages to be gained in using the BJM rather than JM model. 
Accordingly we reconimend the new model to users, v/ho can be 
confident that they will at least obtain answers which are no 
worse than would have been obtained by the old model. 

Having said that, we think it is important that users of any 
software reliability model do not simply assume that the metrics 
are trustworthy. We suggest that whenever a data set is analysed, 
the quality of the metrics on that data set should be examined. 
This can easily bo performed using our techniques or other 
irfcrmal methods. This kind of analysis can never give assurance 
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that the correct model has been applied* but It will usually be able to 
(tetect the use of a grossly unsuit?ble moctel and/or inference procedure. 

So far we have compared the two models and shown that BOM 
never seems to be worse than JM. We now consider the fact that 
BOM still seems to give optimistic predictions in most cases. The 
degree of this optimism varies considerably from one program to 
another, which reinforces our suggestion that in each application 
it is advisable to investigate the quality of the predictions. It 
is interesting that the model always seems to deviate in the same 
direction: towards being too optimistic. We believe this is a 
consequence of the basic assumption underlying both models, ti»at 
all faults contribute the same amount to the overall failure rate 
of the program. In fact, it seems much more plausible that a 
program starts life containing faults of different sizes, i.e. 
faults which contribute different amounts to the program failure 
rate. Since both models assume faults are uncovered randomly, 
this would imply that the times to discovery of different faults 
are di fferently (not identically) exponentially distributed. This 
scenario seems to accord with experience: some program faults seem 
"difficult to find" in the sense that, if they were left in the 
program, they would manifest themselves in program failures very 
infrequently. Others seem to be ajsociated with high occurrence 
rates. This effect is modelled via a Bayesian argument in a recent 
paper [2|J by one of us. 

If the rates associated with ttie pool of faults initially in 
the program really are different, and if fault discovery (failure 
g; curretice) occurs randomly (as both JM and bJM assume), then it 


- 35 - OF pooi 0mm 


might be expected that early fixes cause greater improyeinent in 
the failure rate of the program than later fixes. This "law of 
diminishing returns" of debugging would be represented by a 
failure rate, as a function of i (failure number), which is shown 
in Fig 2. Using the JM or BJM model could then be seen as somehow 
"best fitting" a linear function to this non-linear graph. Fig 2 
shows how such an operation might be expected to give optimistic 
estimates of the current failure rate. Confirmation of this 
hypothesis comes from an examination of the behaviour of the ML 
estimate of x=N(j., the initial failure rate, in the JM model. 

This is the intercept on the vertical axis on Fig 2. If our 
assertion were correct, we would expect that, as the sample size 
increased, the "best-fitting" straight line on Fig 2 would have 
decreasing slope (f, and decreasing intercept X . This is 
easily seen to be true for System 3 (and System 40, despite our 
reservations about this data) by considering how NxJ changes 
with n. It is also true for the other two data sets considered 
here, and all other Musa data sets we have analysed. We are not 
aware of any other explanation for such a consistent effect, and 
it does not seem to have been noticed by other authors. 

We hope to report shortly on some recent work using the new 
model [21], with non-linear failure rate function. Preliminary 
results show that it seems to perform notably better than JM or 
BJM. Our hypothesis that early fixes cause greater improvements than 
later ones is supported by some recent empirical studies of Nagel and 
SLn'van [24|. This interesting work suggests that the differences 
in size of different faults may be surprisingly large. 


- 36 - 


cmfjirjnt r ; „■ -- 

References ^ 

[1] Z. Jelinski and P.B. Moranda, "Software reliability research", 
in Statistical Computer Performance Evaluation (W. Freiberger, 
ed.). New York; Academic Press, 1972, pp. 465-484. 

[2] M. Shooman, "Operational testing and software reliability 
during program development". In Record, 1973 IEEE Symp. 
Computer Software Reliability , New York, NY, 1973, 

April 30 -May 2, pp. 51-57. 

[3] M. Shooman, "Probabilistic models for software reliability 
and prediction", in Statistical Computer Performance 

Eval uation (W. Freiberger, ed.). New York: Academic Press, 
1972, pp. 485-502. 

[4] J.D. Musa, "A theory of softv/are reliability and its 
application", IEEE Trans. Software Engineering , Vol SE-1 , 

1975 Sept, pp. 312-327. 

15] A.K. Goel and K. Okumoto, "Bayesian software prediction 

models, Vol 1: An imperfect debugging model for reliability 
and other quantitatine measures of software systems", 
RADC-TR-78-1 55, Rome Air Development Center, NY, 1978. 

[6] A.K. Goel, "Software error detection model with applications", 
J. Systems and Software, 1 , 1980, pp. 243-249. 

[7] B. Littlewood and J.L. Verral , "A Bayesian reliability growth 
model for computer software", J. Poyal Statist. Soc . , 
C(Applied Statistics), 22, 1973, pp. 332-346. 

[8j L.H. Crow, "Confidence interval procedures for reliability 
growtfi analysis", Tech. Report Nc.197, US Army Material 
Systems Analysis Activity, Aberdeen, Md. , 1977. 


. 37 - ORIGINAL PAGE 19 

OF POOR QUALITY 

19] B. Littlewood, "How to measure software reliabilty growth 
and how not to", IEEE Trans. Reliability , Vol R-28, 1979 
June, pp. 103-110. 

[10] E.H. Forman, "Statistical models and methods for measuring 
software reliability", D.Sc. dissertation, SEAS, George 
Washington U., Washington DC, 1974. 

[11] B. Littlewood and J.L. Verrall, "On the likelihood function 
of a debugging model for computer software reliability", 

IEEE Trans. Reliability , Vol R-30, 1981 June, pp. 145-148. 

[12] J.D. Musa, "Software reliability data", report available 
from Data and Analysis Center for Software, Rome Air Develop- 
ment Center, Rome, NY. 

[13] P.A, Keiller, "Comparison of two software reliability models". 
Project for professional degree, submitted to School of 
Engineering and Applied Science, George Washington University, 
Washington, DC, 1982, 

[14] E.H. Forman and N.D. Singpurwalla, "An empirical stopping rule 
for debugging and testing computer software", J Amer. Statist. 
Assoc. , Vol 72, 1977 Dec, pp. 750-757. 

[15] National Bureau of Standards, Handbook of Mathematical 
Functions . Washington, DC, 1964. 

[16] A. lannino, J.D. Musa, K, Okumoto and B. Littlewood, "Criteria 
for software reliability model comparisons", draft available 
from first author: Bell Labs., Whippany, NJ07981 (1981), 

[17j H. Braun and J.M, Paine, "A comparative study of models for 

reliability growth". Tech. Report No. 126, Series 2, Department 
of Statistics, Princeton University, July 1977. 


- 38 - ORIGINAL PAGE 18 

OF POOR QUALITY 

[181 H. Braun and N. Schenker. “New models for reliability growth". 
Tech. Report No. 174, Department of Statistics, Princeton 
University, October 1580. 

[19] D.R. Cox and P.A.W. Lewis, Statistical Analysis of Series of 
Events , Methuen, London 1966. 

[20] B. Littlewood, "What makes a reliable program: few bugs or a 
small failure rate?" Proc. 1980 National Computer Conference , 
AFIPS Press, Arlington, VA, 1980, pp. 707-713. 

[21] B. Littlewood, "Stochastic reliability growth: a model for 
fault-removal in computer programs and hardware designs", 

IEEE Trans. Reliability , Vol R-30, 4 Oct. 1981, pp. 313-320. 

[22] B. Littlewood, "Theories of software reliability: how good are 
they and how can they be improved", IEEE Trans. Software 
Engineering , SE-6, 1980, pp. 489-500. 

[23] M.H. De Groot, Optimal Statistical Decisions . New York: 
McGraw-Hill, 1970. 

[24] P.M. Nagel and J.A. Skrivan, "Software reliability: repetitive 
run experimentation and modelling", BCS-40399, Boeing Computer 
Services Company, Seattle, Washington, December 1981. 


- 39 - 


OniGtNAL 

OF POO** 

ABpfindjx 

The posterior distribution of (X,#) with proper 
independent gamma priors is 

p(x,*|tj t^) ■ 

C*x(x-a)..(x-[n-l]^)exp{-xtj-...-(x-[n-lU)yx‘*"V^V"V9^ (Al) 

for X»(n-1)^ and zero otherwise, where C* is a normalising 
constant. 

If we transform to (x,^) where x • x*(n>1)^ , we get 
p(x , (^|t^ tjj) « 

"n*(x+i^) [x + (n-l)^!**’'. 
i«o 

g-c(x+(n-l)> ) ^f-1 



g-x[c+ztj] ^b+f+i-j -2 g- 4 >[g+c(n-l) +E(n-j)tj) (A2) 

Which is a finite mixture of distributions of the form 

r(x; n+j-i+1, c+Etj).r(<{.; b+f+i-j-1, g+c(n-l) + t(n-j)tj) (A3) 

Denote by F the class of distributions which are finite 
mixtures like (A3): i.e. finite mixtures of r(x; a, 6).r(<|i; y* 5) 
where n is integer. Since x can be thought of as "current 
failure rate" (i.e. the failure rate of the program after the nth 
failure but before the nth fix), the above result can be 
generalised and given ths following interpretation. If we choose 
our prior for current failure rate and from F, then under the 
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BJM model our posterior for rjew current failure rate and 4 will 

also be a member of P. This idea is similar to the concept of 

'^'closed under sampling" [23] » and we can think of the family F 

being in some way a natural conjugate to the BJM model. The 

prior we have used is clearly a member of F, and the above 

observation gives some support to our choice; it can also be used 

to support the particular form of "ignorance prior" used in the 

body of the paper. 

We want 


•••*„> -PlVi* ‘I*, 


X ) 


^n+1 ( ^ I ^ I »..»tj^)dxd4 
n-l)4 


where 


® if X > 041 

= 1 if (n>l )4 < X < riiji 


cr> 


I P(x,<fltj .... t^)dxd4 

4=0 X=n 4 


X=n4, 


P(x,4|tj ... t^)dXd4 


s>=-o X = (n-1 ) 4 , 

The first integral in (A6) is 

c‘ I I e-l*-"*)' x(l-,)...(.'-ln-IU) 

4=0 x=n,* 


(A4) 


(A 5) 


(A6) 


t'xp{ - ’t.j-...-(A-ln-l] d>. 


d4 
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Putt,- X * X - n4 this beccMtts 


c* jre-9^ f .5 

^*0 x*o 

(x + r4)**“* e dxd* 


(; 


♦'=0 xU < -» J'O 

xj(nt)''-J-‘ e-x(««j) e'“ e-“*<lxd« 

(Notice that it is at this stage that we need b to be an Integer) 

■ ,l A •-C;)-”''' 

oo 0» 

f ^b+f+i-j -2 (g+nc+z(n-j+i)tj) | ^n-i+j g-x(c+t+rtj) 


= C 


■* I Y ^ 

i=o j=o (c+t+rt-) 

J 


r(n-i+j+i )r(b+f+i-j“i ) 


(A7) 


The second integral in (A6) can be evressed as a difference 
of two integrals, of which one is 

x(X“(ji) .{x-[n-l]<»)exp{-xtj - . . .-(x-[n-l] 4 >)t^} 


(}i=o x=n<» 


,b-i -cx f-i ^-g((i . 

X e 4/ e ^ dxd(f 


Which is simply (A7) with t=0: 

f* V '’v' , r(n-Ujti)r(b*f*i-j-i) 

iSo j/" (ctit, (9*nc*£(n-j*i )t, )'>*'♦' -j-‘ 

J J 

(A8) 
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The other integral needed for the second term of (A6) Is 

C* I I x(x-*)...(x-In-ll4)exp{-xtj -...-(X“In-l]^)tj^> 

<|.=o x=(n-l)^ 

^^b-ie-cx^f-i^,.g^dxd* 


Put x-[n-l]4> = X and vie have 

C* I I (x+U).(x+[n-l]«)‘’‘^ 
(^=0 x=o 




e-^>t(n-j)tj. g-c(x+[n-lj<>) /-le’S^dxd* 



e'9* <!xd* 

n-1 b-i 


c*"i‘ i “i.d-.h')''-’)'''" 

1=0 j=o ’ \ J / 


f ^b+f+i-j -2 g-<|)( 9 +c(n-i)+z(n-j)tj) 

4>=o 

■ ^-x(c.rtj) 


x=o 


= C 


*n-i b-i 


y y a. ixb-j-i r(n-i+j+l)r(b+f+i-j-1) 


Finally, we have from (A6) 

= (A5) + (A7)-(A6) 


(A 9) 


and 


= (a:<). 
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It is easy to see that we obtain the Improper uniform prior 
result, (17), when w let f*>b*l and c,g-»^0. 

The probability that the program Is now "perfect", i.e. that 
the last error has been removed. Is 

’’o ' = m-m 

/b-A (n-l r(n-itjtl )r(btf«1-J-l ) 

= 1 - i — ^ i 

r{n-i-fj+l)r(b»fti-j-i) 

J J 

(AlO) 

These expressions do not present Insuperable computational 
difficulties. The main problem is one of eliciting the prior 
information in the form of the numbers b(integer), c, f and g. 

There are various ways in which, in principle, "your" prior 
beliefs about, say, x could be elicited within the gamma 
distribution frairework. You could be asked to give your best 
guess of the mean and variance of your beliefs about x. Such an 
approach does not seem to represent how we "naturally" think about 
uncertainty. An alternative approach would be to ask "you" to fix 
two percentiles of X, say the 25% and 75% points. From either of 
these approaches it is a simple matter to calculate "your" b and 
c. A more satisfactory approach might involve a certain amount of 
feedback, with "you" being able to see the consequences of your 
choices of b and c and modify them. This problem is one which is 
central to Bayesian inference, and it is not appropriate to dwell 
on it at length here. 
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Table 1 ; Analysis of Musa System 3 data. Here n represents 
nimber of observation, and sample size In ML calculations aiKl 
Bayesian analysis; t^ Is nth 1nter>fa11ure time measured In 

A A 

seconds. N,4 are Ml estimates of N ,4 In JM model based on n 
observations. the probability Integral 

transforms of t^^^j using the predictor distributions based on 
tj (see (40) for JM and (41) for BJM). represents 

the probability that the program Is perfect, I.e. the last fault 
has been removed, for the BJM model. 

Table 2: Same structure as Table 1, for Musa System 40 data. 
Figure 1: Failure rate versus failure number for JM model: 

X. = (N- 1 +1)4«. 

Figure 2 : Dots represent failure rate versus failure number as 
we suggest it ought to be: i.e. early fixes have greater effect 
than later ones. Crosses represent "best fitting” JM linear 
failure rate function. 

Plot 1 Procedure 1 for Musa System 3 data, sample sizes here 
range from 18 through 37; JM model is represented by crosses, 

BJM by dots. 

Plot 2 Procedure 2 for Musa System 3 data, same sample size range 
as in Plot 1. Again JM represented by crosses, BJM by dots. For 
clarity the actual step-function sample cdf's are shown. 


Plot 3 As Plot 1, for Musa System 40 data; sample sizes 51 through 


Plot 5 As Plot 1, for ^hjsa System 1 data; sample sizes 30 
through 129. 

Plot 6 As Plot 2» same data as for Plot 5. Here BJN and JM 
are extremely close and only JM is sN)wn. The line with smallest 
slope shows the closeness to linearity of points 34 through 90 
(see text). 

Plot 7 As Plot 1, for Musa System 2 data;- sample sizes 14 through 
53. 

Plot 8 As Plot 2, same data as for Plot 7. 
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